Major League Baseball’s 2021 season has been marked by record breaking statistics. Particularly the record low ERA’s, extremely high spin rates, and a record breaking amount of no-hitters a few months into the season. Baseball is a game many consider boring from lack of action… -avg spin rate prior to June 1st was around 2280 -mention the og steroid era -data from start of season to June 1, and June 1 to July 17, from Baseball Savant. -sinkers and curveballs
For a long time MLB teams had what some would call a gentleman’s agreement over utilizing foreign substances. Initially this started as a way for pitchers to better their grip on the ball when they pitched. Yet, now it’s gotten to the point that the foreign substances are so sticky that instead of just helping the pitchers grip the ball, it’s making the ball stick to their hand for longer, thus increasing the spin rate of the ball and making it harder to hit. As the spin of the ball increases the general location in which the ball is getting thrown is higher, adding to the difficulties hitters are facing. The spin on pitches like the 4-seam fastballs is a back spin, which opposes the downward force of gravity on the ball (think of the term ‘rising fastball’).
The MLB made a decision coming out against foreign substances by stating that starting on June 1st umpires will begin checking for pitchers using foreign substances and on June 16th announced players caught using them will be removed from the game, placed on a 10 game suspension, and fined. Many people have argued that these statements won’t deter pitchers from utilizing foreign substances. Yet many analysts have been saying spin rates going down and batting averages going up. In this notebook we will be examining the affect of the MLB’s new implementation and examining changes with individual players.
The data referenced has been provided by Baseball Savant found through utilizing the search feature.
It’s important to note that generally as increases velocity so does spin rate. In this section we will be comparing pitchers spin rates vs velocities before and after June 1st.
library(ggplot2) ## loading packages
library(ggExtra)
library(dplyr)
library(tidyverse)
b4 <-
read.csv('b4June1.csv')
b4
b4gr <-
b4 %>%
ggplot(aes(x = velocity, y = spin_rate, color = total_pitches)) +
geom_point(stat = 'identity') +
xlab('Velocity (mph)') +
ylab('Spin Rate (rpm)') +
ggtitle('Spin Rate VS Velocity Prior to June 1st') +
labs(color = 'Total Pitches')
p1 <-
ggMarginal(b4gr, type = 'histogram')
p1
This data shows that generally pitchers average spin rates are concentrated between 2200 rpm and 2600 rpm and the velocities range from the upper 80s to mid 90s.
aft <-
read.csv('aftJune1st.csv')
aft
aftgr <-
aft %>%
ggplot(aes(x = velocity, y = spin_rate, color = total_pitches)) +
geom_point(stat = 'identity') +
xlab('Velocity (mph)') +
ylab('Spin Rate (rpm)') +
ggtitle('Spin Rate VS Velocity After June 1st') +
labs(color = 'Total Pitches')
p2 <-
ggMarginal(aftgr, type = 'histogram')
p2
p1
Based on this graph of the data following June 1st, there doesn’t seem to be much a difference. With velocities concentrated between the upper 80’s and lower 90’s and spin rates between 2000 rpm and 2500 rpm. Upon further inspection of the scatterplot it appears that most of the outliers are darker shaded dots representing pitcher who’ve thrown between 1 and 250 pitches. Because this data is based on averages and smaller sample sizes are generally less representative of the individual. This prompted me to see if there was a difference when if I increased the minimum number of pitches to above 250.
b4abv <-
b4%>%
filter(total_pitches > 249)
b4abv
aftabv <-
aft%>%
filter(total_pitches > 249)
aftabv
babvgr <-
b4abv %>%
ggplot(aes(x = velocity, y = spin_rate, color = total_pitches)) +
geom_point(stat = 'identity') +
xlab('Velocity (mph)') +
ylab('Spin Rate (rpm)') +
labs(color = 'Total Pitches',
title = 'Spin Rate vs Velocity Before June 1st',
caption = 'For pitchers with over 250 pitches') +
scale_color_viridis_c()
afabvgr <-
aftabv %>%
ggplot(aes(x = velocity, y = spin_rate, color = total_pitches)) +
geom_point(stat = 'identity') +
xlab('Velocity (mph)') +
ylab('Spin Rate (rpm)') +
labs(color = 'Total Pitches',
title = 'Spin Rate vs Velocity After June 1st',
caption = 'For pitchers with over 250 pitches') +
scale_color_viridis_c()
k1 <-
ggMarginal(afabvgr, type = 'histogram')
k1
g1 <-
ggMarginal(babvgr, type = 'histogram')
g1
Here we can see that…
It’s hard to have any conversation about foreign substances in baseball without mentioning Trevor Bauer. From being someone who initially spoken against foreign substances, once even stating that they could be more powerful than steroids, to now being accused of having the most effective substance combination in the game, Bauer’s name has been brought up a lot.
bauer <-
read.csv('bauer advanced stats.csv')
bauer
tbju28 <-
bauer %>%
filter(game_date == '2021-06-28')
tbju28
tb28gr <-
tbju28%>%
ggplot(aes(x = at_bat_number , y = release_spin_rate, color = pitch_name)) +
geom_line(stat = 'identity')+
xlab('At Bat Number') +
ylab('Release Spin Rate') +
labs(title = "Trevor Bauer's Spin Rate by Pitch Type",
color = 'Pitch Name',
caption = 'From his last game on June 28th')
tb28gr
tbju6 <-
bauer %>%
filter(game_date == '2021-06-06')
tbju6
tb6gr <-
tbju6%>%
ggplot(aes(x = at_bat_number , y = release_spin_rate, color = pitch_name)) +
geom_line(stat = 'identity')+
#geom_smooth()+
xlab('At Bat Number') +
ylab('Release Spin Rate') +
labs(title = "Trevor Bauer's Spin Rate by Pitch Type",
color = 'Pitch Name',
caption = 'From game pitched on June 6th')
tb6gr
tb28gr
## pitch spin rt vs velocity
tbsvvg <-
tbju6%>%
ggplot(aes(x = release_speed, y = release_spin_rate, color = pitch_name)) +
geom_point(sstat = 'identity')
Ignoring unknown parameters: sstat
tbsvvg
tbs2 <-
tbju28%>%
ggplot(aes(x = release_speed, y = release_spin_rate, color = pitch_name)) +
geom_point(sstat = 'identity')
Ignoring unknown parameters: sstat
tbs2
gcoledt <-
read.csv('Gerrit Cole advanced stats.csv')
gcoledt
gcju3 <-
gcoledt %>%
filter(game_date == '2021-06-03')
gcju3
gcjl10 <-
gcoledt %>%
filter(game_date == '2021-07-10')
gcjl10
gcju3gr <-
gcju3 %>%
ggplot(aes(x = at_bat_number, y = release_spin_rate, color = pitch_name)) +
geom_line(stat = 'identity') +
xlab('At Bat Number') +
ylab('Release Spin Rate') +
labs(title = "Gerrit Cole's Spin Rate by Pitch",
caption = 'From the game pitched on June 3rd',
color = 'Pitch Type')
gcju3gr
gcjl10gr <-
gcjl10 %>%
ggplot(aes(x = at_bat_number, y = release_spin_rate, color = pitch_name)) +
geom_line(stat = 'identity') +
xlab('Number At Bat') +
ylab('Release Spin Rate') +
labs(title = "Gerrit Cole's Spin Rate by Pitch Type",
caption = 'From the game Cole pitched on July 10th',
color = 'Pitch Type')
gcjl10gr
gcju3gr
grichdt <-
read.csv('Garrett Richards advanced stats.csv')
grichdt
grjul9 <-
grichdt %>%
filter(game_date == '2021-07-09')
grjul9
grjul9gr <-
grjul9 %>%
ggplot(aes(x = pitch_number, y = release_spin_rate, color = pitch_name)) +
geom_line(stat = 'identity') +
geom_point() +
#geom_smooth() +
xlab('At Bat Number') +
ylab('Release Spin Rate') +
labs(title = 'Gerritt Richards Spin Rate by Pitch',
caption = 'For game pitched on July 7th',
color = 'Pitch Type')
grjul9gr